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Accomplishments:  Let  us  first  recall  that  Engineering  Awareness  is,  in  our  vision,  the 
ability  to  engineer  systems  in  which  effective  situational  awareness  is  possible.  A  funda¬ 
mental  challenge  in  this  art  is  to  be  able  to  effectively  monitor  environments.  Examples  of 
environments  include  networked  computer  systems,  autonomic  computing  systems  and  dis¬ 
tributed  and  dynamic  information  systems.  In  our  approach  an  environment  consists,  in  its 
most  abstract  form,  of  multiple  processes  or  behaviors  that  we  typically  model  as  Finite  State 
Machines  such  as  Probabilistic  and  nonprobabilistic  Finite  State  Automata  (DFAs/PFAs), 
Probabilistic  Deterministic  Finite  State  Automata  (PDFAs),  Probabilistic  Suffix  Automata 
(PSAs),  Hidden  Markov  Models  (HMMs),  etc.  The  main  goal  of  the  project  has  been  to 
establish  fundamental  scientific  and  engineering  results  to  meet  the  challenge. 

Given  this  premise  the  description  of  our  main  contributions  can  be  articulated  along  the 
following  four  points: 

1.  Process  Trackability:  we  introduced  a  rigorous  notion  of  trackability  of  processes/ 
behaviors  with  sensor  networks. 

We  developed  a  quantitative  theory  of  trackability  of  weak  models  that  investigates 
the  rate  of  growth  of  the  number  of  consistent  tracks  given  a  temporal  sequence  of 
observations  made  by  the  sensor  network. 

The  phenomenon  being  tracked  is  modeled  by  a  nondeterministic  finite  automaton  (a 
weak  model)  and  the  sensor  network  is  modeled  by  an  observer  capable  of  detecting 
events  related,  typically  ambiguously,  to  the  states  of  the  underlying  automaton.  For¬ 
mally,  an  input  string  of  symbols  (the  sensor  network  observations)  that  is  presented 
to  a  nondeterministic  finite  automaton,  A/,  (the  weak  model)  determines  a  set  of  state 
sequences  (the  tracks  or  hypotheses)  that  are  capable  of  generating  the  input  string. 
We  study  the  growth  of  the  size  of  this  candidate  set  of  tracks  as  a  function  of  the 
length  of  the  input  string. 

One  key  result  is  that  for  a  given  automaton  and  sensor  coverage,  the  worst-case  rate  of 
growth  is  either  polynomial  or  exponential  in  the  number  of  observations,  indicating  a 
kind  of  phase  transition  in  tracking  accuracy.  Moreover  this  character  can  be  decided  in 
polynomial  time  in  the  size  of  the  representation  of  the  model.  Technically  we  related 
this  problem  to  deciding  whether  the  Joint  Spectral  Radius  of  a  finite  set  of  matrices 
with  entries  in  {0, 1}  is  less  than  or  equal  to  1. 

These  results  have  applications  to  various  tracking  problems  of  recent  interest  involving 
tracking  phenomena  using  noisy  observations  of  hidden  states  such  as:  sensor  networks, 
computer  network  security,  autonomic  computing  and  dynamic  social  network  analysis. 
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Those  results  appeared  in  a  single  comprehensive  seminal  paper  [1]  which  was  published 
after  four  years  of  reviews  and  nontrivial  enhancements  in  the  special  format  of  42  pages 
(as  30  was  the  journal  limit). 

2.  Process  Learning:  we  devised  a  novel  methodology  to  machine  learn  Hidden  Markov 
Models  (HMMs)  from  observed  (typical)  data.  Our  new  algorithms  are  based  on  the 
non-negative  matrix  factorization  (NMF)  of  higher  order  Markovian  statistics  and  are 
structurally  different  from  the  classical  Baum- Welsh  and  associated  approaches. 

At  a  conceptual  level,  our  algorithm  operates  as  follows.  We  first  estimate  the  matrix  of 
an  observation  sequence’s  high  order  statistics.  This  matrix  has  a  natural  non-negative 
matrix  factorization  (NMF)  which  can  be  interpreted  in  terms  of  the  probability  distri¬ 
bution  of  future  observations  given  the  current  state  of  the  underlying  Markov  Chain. 
Once  estimated,  these  probability  distributions  can  be  used  to  directly  estimate  the 
transition  probabilities  of  the  HMM. 

Part  of  these  results  are  contained  in  [3].  This  work  has  been  accepted  for  publication 
and  is  to  appear  in  IEEE  Transactions  on  Information  Theory.  The  original  (first) 
submission  is  publicly  available  on  line  in  ArXiv.  The  final  version  is  about  to  be 
uploaded  to  ArXiv  as  well. 

More  papers  containing  the  most  recent  results  are  in  course  of  preparation,  including 
a  monograph  on  Machine  Learning  Processes. 

3.  Process  Complexification:  we  developed  new  methods  to  shape  network  commu¬ 
nications  in  order  to  prevent  covert  transmissions  from  hiding  behind  the  statistics  of 
ordinary  traffic. 

The  general  idea  is  the  following.  In  a  local  area  network  we  view  traffic  as  a  stationary 
stochastic  process.  We  then  machine  learn  its  stationary  border  statistics  and  build  a 
model  of  a  process  that  shares  the  same  fc-order  stationary  distributions  but  possesses 
different  (k  +  l)-order  stationary  distributions.  The  constructed  model  can  then  be 
used  to  shape  local  transmissions. 

We  have  achieved  nontrivial  results  in  this  direction  and  successfully  presented  their 
details  during  the  program  reviews.  This  is  still  work  in  progress  and  those  results  will 
be  soon  submitted  for  publication. 

4.  Design  Methodologies  and  Distributed  Sampling: 

The  results  of  this  work  were  published  in  [2]  and  were  the  culmination  of  a  long 
standing  scientific  dispute  between  two  different  approaches:  the  Galstyan  and  Lerman 
bottom-up  methodology,  based  on  Statistical  Physics,  and  the  Cybenko  and  Crespi 
top-down  methodology,  evolved  from  classical  control  theory. 

Traditionally,  two  alternative  design  approaches  have  been  available  to  engineers:  top- 
down  and  bottom-up.  In  the  top-down  approach,  the  design  process  starts  with  speci¬ 
fying  the  global  system  state  and  assuming  that  each  component  has  global  knowledge 
of  the  system,  as  in  a  centralized  approach.  The  solution  is  then  decentralized  by 
replacing  global  knowledge  with  communication.  In  the  bottom-up  approach,  on  the 
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other  hand,  the  design  starts  with  specifying  requirements  and  capabilities  of  individ¬ 
ual  components,  and  the  global  behavior  is  said  to  emerge  out  of  interactions  among 
constituent  components  and  between  components  and  the  environment. 

We  performed  a  comparative  study  of  two  design  methodologies  and  showed  that  un¬ 
der  certain  assumptions  on  the  communication  and  the  external  environment,  both 
bottom  up  and  top-down  methodologies  produce  very  similar  solutions. 

We  demonstrated  those  ideas  on  a  scenario  of  distributed  sampling:  in  a  closed  arena 
a  known  number  of  Mo  =  R  +  G  (G  are  green  and  R  arc  red)  pucks  have  been 
disseminated  in  unknown  positions.  The  numbers  of  either  type  of  puck,  R  and  G,  are 
unknown  and  can  even  change  in  time.  We  deploy  N  robots  equipped  with  a  red  lamp 
and  a  green  lamp  to  sample  the  pucks.  Each  robot  can  be  foraging  for  one  type  of  puck 
at  any  given  time  and  its  foraging  state  will  be  displayed  by  lamp  color  to  other  robots. 
Wo  assume  that  robots  have  a  memory  buffer  of  a  certain  length  where  they  can  store 
their  recent  observations  of  pucks  and  other  robots.  The  goal  of  the  application  is  to 
have,  on  average,  the  same  proportion  of  red  to  green  robots  as  the  proportion  of  red 
to  green  pueks  in  the  arena.  The  task  is  to  define  color-selection  rules  based  on  robots’ 
memory  and  interaction  with  other  robots  and/or  environment  . 

Archival  Publications: 

1.  V.Crcspi,  G.Cybenko,  G. Jiang,  “The  Theory  of  Trackability  with  Applications  to  Sen¬ 
sor  Networks”,  ACM  Transactions  on  Sensor  Networks  (special  publication:  42  pages, 
journal  limit,  30),  May  2008. 

2.  V.  Crcspi,  A.  Galstyan,  K.  Lerman.  “Top-Down  vs  Bottom-up  Methodologies  in  Multi- 
Agent  System  Design” ,  Journal  of  Autonomous  Robots,  2008. 

3.  G.  Cybenko,  V.  Crcspi,  “Learning  Hidden  Markov  Models  using  non-negative  matrix 
factorization”.  To  appear  in  IEEE  Transactions  on  Information  Theory,  2011.  First 
version,  submitted  in  September  2008,  is  available  to  the  public  at  arXiv:0809.4086vl, 
2008. 

Note:  Referenced  by  other  scientists,  e.g. 

http://www.cscs.umich.edu/~crshalizi/notabene/infcrence-markov.html,  as  one  of  the 
most  relevant  papers  in  the  area. 

4.  In  preparation: 

•  V.  Crespi,  G.  Cybenko,  “Statistical  Learning  of  Stochastic  Behaviors  and  Pro¬ 
cesses”.  Monograph,  currently  in  preparation. 

•  V.Crcspi,  G.  Cybenko,  A.  Giani,  “Cognitive  Complcxificatioir .  Currently  in 
preparation. 
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